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1 . Introduction 

The first part of the ‘International Handbook of Science Education’ (Fraser & Tobin, 1998) 
contains a whole section devoted to learning environments research. A conclusion that can be drawn 
from this section is that science education researchers have led the world in the field of classroom 
environment over the last two decades (Fraser, 1998; Waldrip & Fisher, 2000), Research in this field 
has provided means of monitoring, evaluating and improving science teaching and curriculum. Fraser 
(1998), in his contribution to the handbook, suggests a number of developments that may advance the 
field of learning environments research in the future. One development is the use of multilevel 
analyses techniques. Studies using such methodology have been scarce, and as a result the domain has 
largely ignored the nested and hierarchical structure of most school and classroom environments. 
Another development is the undertaking of cross national research. According to Fraser, such research 
offers much promise for the future for at least two reasons: (a) in cross national studies usually greater 
variation can be found in the variables of interest, and (b) familiar practices, beliefs and attitudes can 
be exposed and questioned. While large-scale cross national studies, such as the third international 
mathematics and science study (TIMMS), have been undertaken in science education, science teaching 
practice in itself was not studied directly, due to a lack of suitable instruments. Clearly, the field is in 
need for valid instruments that allow for cross national comparisons of science teaching. 

In the Netherlands a particular development in classroom environments research occurred 
around interpersonal skills of science teachers (Wubbels, Creton & Hooymaters, 1985). Research has 
shown that one of the most important problems many experienced and beginning teachers face is 
maintaining order in the classroom (Veenman, 1992). Order, a critical component of classroom 
atmosphere, is heavily influenced by the interpersonal skills of a teacher* (Croton, Wubbels, & 
Hooymayers, 1989). With this in mind, Wubbels, Creton and Hooymayers (1985) developed their 
Model for Interpersonal Teacher Behavior (MITB), to map interpersonal teacher behavior extrapolated 
from the work of Leary (1957). 

This model (see Figure 1) maps interpersonal teacher behavior with the aid of two dimensions: 
an influence dimension (describing who is in control in the teacher-student relationship, the teacher or 
the student) and a proximity dimension (describing the degree of cooperation between teacher and 
students). The influence dimension is characterized by teacher dominance (D) on one end of the 
spectrum, and teacher submission (S) on the other end. Similarly, the proximity dimension is 
characterized by teacher cooperation (C) on one end, and by teacher opposition (O) on the other. The 
two dimensions can be depicted in a two-dimensional plane, that can be further subdivided into eight 
categories or sectors of behavior: leadership (DC), helpful/friendly behavior (CD), understanding 

‘ The study of teaching is one of the areas of interest of the domain of learning environments research (Fraser, 
1998). Apart from an interpersonal viewpoint on teaching, one can take many other viewpoints to analyze 
teacher behavior, such as a learning activities perspective, a subject-content perspective, a moral perspective or 
an organisational perspective (e.g. Brekelmans, Sleegers & Fraser, 2000; den Brok, 2001). 
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behavior (CS), giving responsibility/freedom (SC), uncertain behavior (SO), dissatisfied behavior 
(OS), admonishing behavior (OD) and strictness (DO). Each sector can be described in terms of the 
two dimensions: leadership, for example, contains a high degree of influence and some degree of 
cooperation; helpful/friendly behavior some degree of dominance and a high degree of cooperation; 
etc. There is evidence suggesting that the dimensions of the Leary model - hence: the Model for 
Interpersonal Teacher Behavior - are cross-culturally generalizable (Lonner, 1980). Moreover, the 
interpersonal dimensions are conceptually related to intercultural dimensions (den Brok, Levy, 
Rodriguez, & Wubbels, 2002) such as immediacy (Gorham & Zakahi, 1990), approach-avoidance 
(Andersen, 1985; Hecht, Andersen, & Ribeau, 1989), individualism-collectivism (Kim, Sharkey, & 
Singelis, 1994) and power distance (Hofstede, 1991). 



INFLUENCE 




Figure 1. The Model for Interpersonal Teacher Behavior (MITB). 

Based on the Model for Interpersonal Teacher Behavior, Wubbels et al. (1985) constructed the 
Questionnaire on Teacher Interaction (QTI). The QTI originally consisted of 77 items, answered on a 
Likert-type 5-point scale. The items of the QTI refer to the eight sectors of behavior - leadership, 
helpful/friendly, understanding, giving responsibility/freedom, uncertain, dissatisfied, admonishing 
and strict - that jointly make up the MITB. Since its development, the QTI has been the focus of well 
over 120 (learning environment) studies in many countries (den Brok, Brekelmans, Levy, & Wubbels, 
2002) and has been translated into more than 15 languages (Wubbels, Brekelmans, van Tartwijk, & 
Admiraal, 1997). The original QTI, designed for secondary education, also formed the basis for a 
number of other versions for primary education, higher education, principals and supervisors (den 
Brok, 2001). 
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In the present study, QTI data of Science teachers are compared for 6 countries: the 
Netherlands, USA, Australia, Slovakia, Singapore, and Brunei. The construction process of the QTI 
differed between these six countries. The QTI was first constructed in the Netherlands between 1978 
and 1984 (Wubbels, et al., 1985). Its development involved four rounds of testing - using different sets 
of items -, interviews with teachers, students and teacher educators, and researchers judging the face 
validity of items. In this manner, out of a pool of over 200 items, 77 items were selected for the final 
version. The American version was created between 1985 and 1987 by translating the set of 77 items 
from the Dutch version, adding several items (since several items could be translated in more than one 
way), and adjusting this set of items based on three rounds of testing (Wubbels & Levy, 1991). 
Ultimately, the American version contained 64 items. The American 64 items-version of the QTI was 
initially also used in Australia (Wubbels & Levy, 1993), but ultimately Australian researchers ended 
up with a more economical 48-item selection (Fisher, Henderson, & Fraser, 1995). The Slovak version 
consisted of a translation of the American QTI, with other items added that better represented the 
Slovakian context. Ultimately, this resulted in a version with 64 items that was different from the 
American version (Gavora, Marek, & den Brok, in press). The Australian 48-item version, in turn, 
formed the starting point for the Singapore and Brunei versions, both consisting of the same number of 
items. The Singapore secondary education version consisted of the exact same set of 48 items in 
English (Fisher, Chiew, Wong, & Rickards, 1996). The Brunei version was developed by translating 
the 48 items of the Australian version into Malay and involving several people in checking the back- 
translation into English (Scott & Fisher, 2000). 

As can be seen from the process above, researchers employed several activities to ensure the 
quality of the QTI in their respective countries. First of all, they went through an elaborate and 
painstaking process of translation and back translation, in some cases interviews with teachers and 
students, several rounds of testing and observations in the classroom (Fraser, 2002). Second, 
researchers have assessed the degree to which the eight scales (or sectors) of the QTI in their countries 
were reliable (Cronbach’s alpha) at the student and teacher/class level and were able to detect 
differences between classes and teachers (den Brok, 2001). Third, many researchers computed 
(inter)correlations between the eight scales in order to see if these roughly represented a circular 
ordering (e.g. den Brok, 2001; Evans, 1998; Henderson, 1995; Rawnsley, 1997; Rickards, 1998; Scott, 
2001). In some cases, researchers established stability (Multilevel Lambda) (Snijders & Bosker, 1999) 
of the scales across classes (den Brok, 2001). In other instances, researchers computed dimension 
scores and correlated these (Levy, den Brok & Wubbels, 2001), to see if they were independent. A 
small number of studies conducted exploratory factor analyses (Soerjaninghsih, Fraser, & Aldridge, 
2002; Wubbels & Levy, 1993) or confirmatory factor analyses (den Brok, 2001; den Brok, Levy, 
Wubbels & Rodriguez, in press; Wubbels & Levy, 1991) in order to see if two, independent 
dimensions underpinned the eight scales of their QTI. In some cases, statistical analyses were 
performed on (aggregated) class level data, but in most cases on (individual) student level data. 
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Using the above methods, results have been reported on the quality and validity of the QTI for 
the Netherlands (den Brok, 2001; Wubbels & Levy, 1993), the USA (Wubbels & Levy, 1991; 1993), 
Australia (Fisher, Fraser, & Wubbels, 1993; Fisher, Henderson, & Fraser, 1995; Henderson, Fisher, & 
Fraser, 2000), the United Kingdom (Harkin, Davis, & Turner, 1999), Canada (Lapointe, Pilote, & 
Legault, 1999), Singapore (Fisher, Rickards, Goh, & Wong, 1997; Goh & Fraser, 1996), Brunei (Scott 
& Fisher, 2000), the Philippines (Oberholster, 2001), Israel (Kremer-Hayon & Wubbels, 1992), Hong 
Kong (Yuen, 1999), Korea (Kim, Fisher, & Fraser, 2000), Fiji (Coll, Taylor, Fisher, & Ali, 2000) and 
Indonesia (Soerjaningsih, et al., 2002). Only two studies are known to the authors that investigated 
cross-cultural validity of the QTI (e.g. Fisher, et al., 2000; Wubbels & Levy, 1991). These studies 
reported small differences between countries in correlational patterns, outcomes of factor analyses, 
reliability and percentages of variance at the class level. One other study, on multicultural classes in 
the USA, investigated intercultural validity of the QTI (den Brok, et al., in press). This study showed 
that for four cultural groups - African-Americans, Asian-Americans, Hispanic-Americans and 
Caucasian-Americans - the scales of the QTI were ordered in a circular structure. However, the scales 
displayed relatively large dispositions from their theoretical positions in two of the four cultural 
groups (the African-American and Hispanic-American groups). Thus, for these two groups the MITB 
received less support than for the other two cultural groups. 

While the field has gained much experience from the application of the QTI in different 
contexts (Fraser, 2002) and much is known about the quality of the QTI within various countries, there 
is need for a more elaborate cross-cultural investigation (see also Fraser, 1998; Wubbels & 
Brekelmans, 1998). From a more methodological stance, earlier cross-cultural investigations were 
limited in terms of the countries involved (only two per study), while most of the studies investigating 
validity and reliability within separate countries used limited analysis techniques (discussed in a later 
section). 

Apart from the arguments above, research is in need of cross-culturally valid instruments, 
since they provide opportunity to compare practices between countries, also in the light of large-scale 
international educational (effectiveness) studies. Cross-culturally valid instruments may provide 
opportunity for assessment, self-evaluation and staff-development of teachers in international schools 
or multicultural education contexts. Finally, such instruments open up opportunities for researchers 
and other interested educators to work on joint research projects and use similar language, models and 
terminology. 

This paper compares science students* perception data from six countries and uses specific 
analysis methods to analyze the validity and reliability of the QTI. In this way, it adds to the existing 
knowledge base and compensates for some of the limitations of earlier studies. The paper starts with a 
discussion on the MITB as a circumplex model with specific properties. Next, an overview of methods 
to analyse these properties is given. Then, earlier research on validity and reliability of the QTI is 
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discussed in terms of these methods. Finally, the paper presents outcomes of analyses regarding 
validity and reliability of the QTI using data from six countries. 

2, The circumplex Model for Interpersonal Teacher Behavior (MITB) 

As stated before, the MITB describes interpersonal teacher behavior in terms of two dimensions 
(influence and proximity), that underlie eight behavioral sectors, ordered in a two-dimensional plane. 
The MITB is a special model because of its statistical properties and is theoretically linked to a 
particular branch of models called circumplex models. Circumplex models are based on a specific set 
of assumptions (Blackburn & Ren wick, 1996; Fabrigar, Visser, & Browne, 1997; Gaines, Ranter, 
Lyde, Steers, Rusbult, Cox, & Wexler, 1997; Gurtman & Pincus, 2000). These are: 

Assumption 1: the eight behavioral sectors (or scales) of the model are represented by two, 
dimensions (or factors). 

Assumption 2: the two interpersonal dimensions that lay behind the sectors are uncorrelated^. 
Assumption 3: with the two interpersonal dimensions, the sectors (or scales) of the model can 
be ordered in a circular structure. 

Assumption 4: the sectors (or scales) of the model are equally distributed over this circular 
stmcture. 

Assumption 5: the sectors (or scales) occupy specific positions on the circle (as given in 
Figure 1), which can be determined with a goniometric circle function. 

Assumption 6: the sectors (or scales) of the model share similar amounts of variance (e.g. have 
equal communality). 

To test these assumptions behind the circumplex, psychologists have developed a number of statistical 
tests and procedures. 

Assumption 1 and 2 are very general and usually tested by applying (exploratory) factor 
analyses or multidimensional scaling methods to the sector scores of individuals. With these methods, 
it can be checked if two or more factors underly sector scores. In these analyses, the optimal number 
of factors or dimensions is determined by looking at the amount of variance explained and/or 
eigenvalues of the factors extracted. If factors explain less than 10 percent of ‘additional’ variance, 
eigenvalues are below one, or eigenvalues hardly drop between factors (this can be seen by looking at 
the scree-plot), the number of factors is optimal. Depending on the extraction and rotation methods 
used, it can also be determined if these factors are independent (assumption 2)^. Of course, the overall 



^ Most variants of the circumplex assume two underlying dimensions. Since the MITB originates from Leary’s 
(1957) model, it is additionally assumed that these two dimensions are uncorrelated or independent. However, 
this second assumption is not necessarily specified for other variants of the circumplex (Fabrigar, et al., 1997). 

^ Principal component methods result in independent factors or dimensions. However, researchers are advised to 
use maximum likelihood methods and rotation by hand, instead of principal component methods and varimax 
rotation - which are default methods in most statistical packages - because these better account for circumplex 
properties (den Brok, 2001). 
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amount of variance explained by the factors should be as high as possible. Assumptions 1 and 2 can 
also be tested by means of confirmatory factor analyses (using software such as LISREL, EQS, 
AMOS, Mplus, Circum, etc.). The model specified in such analyses contains two factors or 
dimensions, but allows for free estimation of factor loadings and correlation between the two 
dimensions. To test the second assumption, the correlation between the two factors can be set to zero. 
If model-fit indicators are sufficiently high, it seems very likely that two (independent) factors 
structure the perception scores on the sectors. The factor loadings found can be used to establish 
empirical locations of the sectors in the model. If only the first and second assumption are tested and 
confirmed, psychologists speak of a spatial representation model (Gurtman & Pincus, 2000), irregular 
circumplex or non-circumplex model (Gaines, et al., 1997). 

Assumption 3 is more specific and incorporates assumptions 1 and 2 (e.g. starts from the 
premise that sector scores are represented by two, uncorrelated dimensions). If the eight sectors are 
ordered in a circular structure, this should be represented by their (inter)correlations. This means that 
correlations between pairs of scales or sectors are greater for sectors closer on the (interpersonal) 
circle, and smaller if they are more distant. Thus, correlations between opposing scales are smallest 
(most negative), while correlations between neighbouring scales are highest (positive), and 
correlations decrease in (equal) steps if one moves from neighbouring scales towards opposing scales 
(Gurtman & Pincus, 2000; Tracey, 1994; Tracey & Schneider, 1995). Tracey (1994) developed a 
statistical tool called RANDALL^ to analyse if a correlation matrix has a circular structure. This 
program compares correlations pair-wise and checks if the correlation pattern described above can be 
found in the data. For instruments with eight scales (or sectors), such as the QTI, 288 comparisons 
between scales can be formulated and tested. The outcome of this software is a Correspondence Index 
(or Cl), that basically represents the proportion of correlations that is in accordance with the expected 
circular ordering, and a p-value, that indicates how significant the index is. A Cl-value of .5, for 
example, means that as many comparisons are in accordance with a circular ordering as comparisons 
that are not, while a value of .75 means that twice as many comparisons are in accordance, as 
comparisons that are not. Of course, for a circular structure. Cl should be close to 1 and its p-value 
should be significant. Browne (1992) developed another tool to test the circular structuring of sector 
scores. He developed a software program that can test different circumplex models behind a 
correlation matrix: CIRCUM^. The most relaxed variant that can be tested only assumes a circular 
ordering^ By looking at the model fit indicators it can be determined if the model applies to the data. 
The Browne method is more stringent than the method of Tracey, since it also implies that sectors 
have equal distances to the circle center (Blackburn & Renwick, 1996). If the sector scores can be 



^ This software can be downloaded without costs from: hup://courses.ed.asii.edii/tracev/rand.zio . 

^ This software can be downloaded without costs from: hup://quantrm2. psv.ohio-siate.edu/Browne/ . 
^ This model is also called the unequally spaced, non-equal communalities model. 
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structured in terms of a circle, psychologists speak of a circular order model (Gurtman & Pincus, 

2000 ). 

Compared to assumptions 1, 2 and 3, assumptions 4, 5 and 6 are even more specific, and 
usually involve less exploratory and descriptive methods of testing (Fabrigar, et al., 1997; Gaines, et 
al., 1997; Gurtman & Pincus, 2000). Of these last three assumptions, assumption 4 postulates the most 
relaxed circumplex structure. In this variant of the circumplex, it is assumed that sectors have equal 
distances between them on the circle, however, their exact positions on the circle are not specified. 
The CIRCUM software (Browne, 1992) can be used to test this model. If model fit indicators are 
sufficient, it seems likely that the scales have equal distances on the circle. This variant of the 
circumplex is called the equally-spaced circumplex (Gurtman & Pincus, 2000). 

In assumption 5, not only is it assumed that scales have equal distances on the circle, but their 
position on the circle is also fixed and can be determined exactly with the goniometric circle function 
(Gurtman & Pincus, 2000). The factor loadings in this model represent the positions displayed in 
Figure 1. The factor loadings for the influence dimension are .92 (DC and DO), .38 (CD and OD), -.38 
(CS and OS), and -.92 (SC and SO). For the proximity dimension, factor loadings are .38 (DC and 
SC), .92 (CD and CS), -.92 (OS and OD), and -.38 (SO and DO). This model can be tested with 
LISREL (Joreskog & Sorbom, 1989) or Mplus (Muthen & Muthdn, 1999) by specifying a two- 
dimensional factor model with zero correlation between the factors and the exact (thus fixed) factor 
loadings specified above. By looking at model fit indicators it can be determined if the model fits the 
data. This version of the circumplex is called the ideal circumplex (Gaines, et al., 1997). 

Finally, assumption 6 takes the premise that sector scores have been measured in a similar 
manner and share variance (communality). This assumption is special in nature, since it applies to the 
operationalisation of the model: variance that is unique for each sector is regarded as measurement 
error (Fabrigar, et al., 1997). The assumption can be tested within LISREL or Mplus by restricting the 
estimates of measurement error to be equal across sectors or in CIRCUM by specifying a model with 
equal communalities. The specifications can be added to any of the models testing for assumptions 1 
to 5. However, since questionnaires or other instruments almost never succeed in measuring sectors 
with the same amount of communality or error, this sixth assumption is usually not tested^. 

Apart from the methods above, researchers have also developed a number of indices that allow 
researchers to investigate the degree to which their empirical (circumplex) model, resulting from the 
testing of assumptions 1 to 5, deviates from the (theoretical) ideal circumplex. First, one can 
determine the empirical angular location of a sector, using the factor loadings that resulted from 
(confirmatory or exploratory) factor analyses, and compare these to the ideal locations, resulting in 



’ Note that error may occur due to similar causes for any of the scales, resulting in (measurement) error 
correlations. However, while errors may be related, they can still be different for each scale/sector. The 
relatedness of (measurement) error can also be specified in confirmatory factor models. However, models with 
much error (cor)relalions are not very informative, and thus often not put to the test. 
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angular dislocations^ (Pincus, Gurtman, & Ruiz, 1998; Wagner, Kiesler, & Schmidt, 1995). Second, 
one can compute the distance of a sector to the circle centre^ - also called vector length - and 
determine the degree of variation in these (Blackburn & Renwick, 1996; Pincus & Gurtman, 1995). In 
the ideal case, variation in vector length is minimal. Third, one can compute ideal and empirical 
influence and proximity scores for each respondent and determine the correlation between both sets 
(Blackburn & Renwick, 1996; Wiggins, Philips, & Trapnell, 1989)'^ In the ideal case, correlations 
between the ideal dimension score and its empirical equivalent should be very high (close to 1), while 
correlations between both dimensions should be low (close to 0) and non-significant. 

3. Previous research on validity and reliability of the QTI 

As described earlier, many studies from several countries have reported on the reliability and 
validity of the QTI. A review on a selection of studies from the past 15 years shows that the reliability 
of the scales (sectors) of the QTI is very high (den Brok, 2001): Cronbach’s alpha coefficients are 
roughly between .70 and .85 at the student level, and between .80 and .95 at the class level. 
Percentages of variance at the class level are also high, in most cases between 30 and 50. Confirmatory 
and exploratory factor analyses on the scale scores usually provide evidence for the existence of two, 
uncorrelated dimensions at the class level (den Brok, 2001; Wubbels & Levy, 1991). Additionally, 
researchers report that their correlation matrices seem to be ordered in terms of a circular structure 
(Evans, 1998; Henderson, 1995; Rawnsley, 1997; Rickards, 1998). 

However, for data at the student level, researchers sometimes find three, or even more, factors 
or dimensions, and interpretation of factor loadings is not always straightforward (den Brok, 2001; van 
Tartwijk, Brekelmans, Wubbels, Fisher, & Fraser, 1998). Several explanations have been provided for 
the existence of three or more factors (den Brok, 2001): incompleteness of the current model for 
interpersonal behavior (an additional dimension, for example ‘degree of activity’, is needed), 
generality of the current model of interpersonal behavior (a dimension should perhaps be divided into 
two subdimensions that represent the poles separately), introduction of unintended (context) factors in 
the measurement methods, and methodological problems (resulting in measurement error). 



^ Angle A of a sector (with the cooperation pole of the proximity dimension as the starting point) can be 
computed by the following formula: A=arctan(DS/CO), where DS and CO are factor loadings for the influence 
and proximity dimension, respectively. With the ideal factor loadings, one can compute ideal angular locations 
A(i) in a similar manner. Angular dislocation can be computed with the formula: A’=l -dev/1 80, where dev=A- 

ACO- 

^ Vector Iength=V(influenceVproximity^), where influence=factor loading of sector on influence, proximity= 
factor loading of sector on proximity. 

Ideal dimension scores are computed as follows: Influence = (.92*DC) + (.38*CD) - (.38*CS) - (.92*SC) - 
(.92*50) -(.38*05) +(.38*OD) + (.92*DO); proximity = (.38*DC) + (.92*CD) + (.92*C5) + (.38*5C) - 
(.38*50) - (.92*05) -(.92*OD) - (.38*DO). In the empirical situation, .92 and .38 are replaced by values that 
resulted from factor analyses. 



9 



10 



While previous studies have shown that the QTI is a high quality instrument and appears to be a 
valid representation of the Model of Interpersonal Teacher Behavior, most of the studies are subject to 
shortcomings: 

Methods were highly exploratory and descriptive in nature. If correlation matrices were 
inspected for circumplexity (to “test” assumption 3), this was done subjectively and without 
any statistical test. In many cases, exploratory factor analyses were used (to test assumptions 1 
and 2). These kinds of analyses provided no (statistical) comparison with the ideal circumplex 
model, provided no overal model fit indicators to test whether data fitted the model specified 
and measurement error was not included in the models (or measurement error was 
underestimated). Moreover, researchers used principal components extraction and varimax 
rotation, which are not suitable for circumplex testing. In cases where confirmatory factor 
analyses were performed, only assumptions 1 and 2 were tested (e.g. factor loadings were not 
restricted but freely estimated). 

Methods ignored the nested structure of the data. Most of the studies analyzed student 
perception data, and analyzed this data at the individual (student) level. As a result, 
correlations between scales were probably overestimated, because the methods (confirmatory 
or exploratory factor analyses, correlational analyses) assumed random sampling. If 
respondents are sampled in clusters (as is the case in most educational studies), their answers 
will be more similar than in randomly sampled situations, because they share a similar history 
and context (Hox, 1995; Muthen, 1994). Moreover, analyzing data on class level processes at 
the student level is questionable, since the model and its operationalization were specified and 
designed for the class level, rather than the individual level. 

Some studies aggregated student perception data to the class level. While these studies 
investigated data at the proper level, aggregation has serious drawbacks. It leads to loss of 
information and as a result lack of statistical significance in analyses (Hox, 1995). 

If data were analyzed at the student and class level simultaneously, the same structure was 
assumed at both levels, even though the MITB was formulated for the class level. 
Conceptually, at the individual level QTI -data represent deviations of an individual student 
from the class mean with respect to interpersonal teacher behavior. There may be various 
reasons for a student to differ from the class average, such as personal values and beliefs or 
even individual differences in treatment by the teacher (den Brok, 2001). Moreover, it has 
been shown that a two-dimensional circumplex model is not likely to support student 
perceptions at the individual level (den Brok, 2001). Other research shows that different 
structures may account for clustered data at different levels (Hox, 1995; Muthen, 1994). 

Due to the exploratory nature of factor analyses or inspection of correlation matrices, only 
some of the assumptions of the MITB have been tested. Since only one method of analysis 
was used in most cases, the studies also tested a single assumption at a time. Models found 
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with exploratory methods were not compared to the theoretical ideal by means of computing 
of angles or vector lengths (e.g. Blackburn & Renwick, 1996; Wiggins, et al., 1989). 
Cross-cultural comparisons involved only two countries at a time. In these comparisons, factor 
loadings and correlation patterns were compared subjectively. Model fit indicators were not 
compared across countries. 

The present study tries to overcome most of these limitations by comparing data of science teachers in 
six countries (Australia, Netherlands, USA, Singapore, Brunei and Slovakia), and by using different 
methods of analysis to test the first five assumptions behind the MITB. 

4. Research questions 

Using data from six countries - Australia, USA, the Netherlands, Slovakia, Singapore and 
Brunei, the following research questions were investigated: 

1) To what degree is the QTl capable in reliably and consistently measuring differences in 
interpersonal style between science classes/teachers in six countries? 

2) To what degree do science students’ perceptions, measured with the QTl, represent the MITB 
in each of the six countries? 

5. Method 

5.7 Sample and instruments 

To answer the research questions, QTl data were obtained from researchers that conducted 
their studies in each of the countries of interest, and were then re-analyzed to meet the purposes of the 
present study. To enhance comparison between countries, researchers were asked to provide only data 
on secondary Science (Physics and Chemistry) teachers. Data were used from the Netherlands (919 
students rating 65 teachers), USA (800 students rating 40 teachers), Australia (726 students rating 35 
teachers), Singapore (1,713 students rating 50 teachers), Slovakia (490 students rating 18 teachers) and 
Brunei (644 students rating 35 teachers). 

Table 1 presents some characteristics for the samples in each of the countries. As can be seen, 
samples included considerably smaller number of schools in Australia, the USA, Slovakia and 
Singapore than in the other countries. Moreover, average class size varied from country to country, 
and ranged between 18 (Brunei) and 34 (Singapore) students. In all countries, the majority of the 
classes involved grades 8 to 10. Average teacher experience was high in all countries, except for the 
U.S. Some of the differences found represented sampling procedures used. The Dutch sample was 
taken from a large-scale study involving many other research instruments. As a consequence, only half 
of the students in a class completed the QTl. In all countries, convenience sampling was used, except 
for the Netherlands, were teachers were randomly sampled. In the U.S. this convenience sample 
comprised a selection of multicultural schools and beginning teachers, since data were gathered as part 
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of a professional development programme for teachers. In Slovakia, the sample also contained 
Mathematics, Economy and Social Science teachers. 



Table 1 



Characteristics of the Country Samples 





N-students 


N-classes 


N-schools 


Average group 
size 


Grades 


Teacher 

experience 


Australia 


726 


35 


12 


20.7 


8-10 


High 


United States 


800 


40 


7 


20.0 


7-11 


Medium 


Netherlands 


919 


65 


45 


14.1 


9 


High 


Slovakia 


490 


18 


12 


27.2 


9-10 


High 


Singapore 


1713 


50 


9 


34,3 


8-10 


High 


Brunei 


644 


35 


23 


18,4 


8-10 


High 



Students in each country completed the QTI in their respective languages. In Australia, 
Singapore and Brunei, a 48-item version of the QTI was used. The QTI in the USA and Slovakia 
consisted of 64 items, while the Dutch (original) version consisted of 77 items. In each country, the 
items referred to eight scales, representing the eight sectors of the MITB. For comparative purposes, 
the same 48 items as present in the Australian, Singapore and Brunei versions were selected from the 
Dutch, American and Slovakian data samples. According to Wubbels (1985, following Hui & 
Triandis, 1985), before cross-national equivalence of validity and measurement can be established, 
researchers need to show equivalence of instruments. Instrument equivalence requires that instruments 
are equivalent in terms of conceptual structure (conceptual or functional equivalence), are equivalent 
in terms of operationalization (construct operationalization equivalence), are equivalent in terms of 
items (item equivalence) and are measured on the same degree of intensity, magnitude or measurement 
range (scalar equivalence). In this study, all of these requirements have been met, although 
establishing conceptual equivalence in itself is part of the research questions. Sample items for each of 
the sectors (scales) are given in Table 2. 



Table 2 

Typical hems for the Scales of the QTI 


Scale (sector) 


Typical item 


DC - leadership 


This teacher acts confidently. 


CD - helpful/ friendly 


This teacher is friendly. 


CS - understanding 


This teacher is patient. 


SC - student responsibility/freedom 


We can influence this teacher 


SO - uncertain 


This teacher is hesitant. 


OS - dissatisfied 


This teacher is suspicious. 


OD - admonishing 


This teacher gets angry quickly. 


DO - strict 


This teacher is strict. 



5.2 Analysis procedure 

The analysis procedure involved a number of steps. To answer the first research question, 
sector scores were used (the average of the 6 items that pertain to one sector or scale). Next, intra-class 
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correlations, consistency estimates (Multilevel Lambda^ \ Snijders & Bosker, 1999) and Cronbach’s 
alpha (at the class level) were computed for the sector scores. Intra-class correlations were computed 
with the SPLlT2-software*^ (Hox, 1995). Cronbach’s alpha and Multilevel Lambda were established 
with SPSS. 

The second research question involved testing assumptions 1 to 5 on data from each of the six 
countries. The testing of these assumptions was done with different kinds of software, each of which 
analyzed intercorrelations between the sectors of the QTl. Unfortunately, some software packages, 
RANDALL and CIRCUM, cannot handle multilevel data. Therefore, class-level correlation matrices 
were computed that served as input for these programs. These matrices were so-called uncontaminated 
class-level correlation {or covariance) matrices. These matrices are different from matrices based on 
aggregated data in that they uniquely contain class level variance, rather than being a combination of 
individual and class variance (as is the case for aggregation). Uncontaminated class level correlation 
matrices take the nested structure of the data into account, meaning that in their computation no 
information is lost (which is the case for aggregation). To arrive at these uncontaminated class level 
matrices, we first used SPLIT2 to compute a class level and individual level correlation matrix for 
each country data set. The resulting class level correlation matrix, however, was still contaminated 
(since it still was a combination of student and class level variance; see Hox, 1995). Therefore, these 
matrices were entered into LISREL and a saturated model (see Hox, 1995) was specified with the 
multiple-group option*^. The output of the LISREL set-up for these models then provided 
uncontaminated correlation estimates, that could be used for further analyses. The uncontaminated 
class level correlation matrices for each country can be found in Appendix 1. 

To test the first assumption, exploratory factor analyses (SPSS) were performed on the 
uncontaminated class-level correlation matrices, using Maximul Likelihood extraction (no rotation). In 
the analyses, only factors with an eigenvalue larger than one were selected. The optimal number of 
factors was determined by looking at the decline in eigenvalue (scree-plot). The first assumption was 
also tested by means of confirmatory factor analyes (Mplus), specifying two- factor models (in a 
multilevel set-up) with free factor loadings and correlation between the two factors*"^. Model fit was 
determined by looking at the Chi-squared/df ratio (should be non-significant). Root Mean Squared 



Lambda=Nj*ICC/(l+(Nj-l)*ICC), where ICC=intra class correlation and Nj=average group size. 

This software can be downloaded without costs from: http://www.Fss.uu.nl/ms/ih/papers/snlit2.exe . 

In LISREL a multilevel model can be specified using the multiple group option. The model specifies the 
student level part in the first group and the class or teacher level part in the second group. Since the class level 
correlation matrix of SPLIT2 (or other software that can be used to decompose the data according to levels) still . 
contains individual as well as class level variance, the student level part is also specified in the second group. 
This is done by constraining the student level part of the model ~ in this situation a saturated model - to be equal 
for both groups. The uncontaminated class level correlation matrix can then be found as the standardized psi 
matrix in the output. 

For estimation purposes, a few estimates in the model need to be fixed, since if this is not done, models will be 
over-identified. In our case, the factor loadings between the first factor and DC as well as the factor loading 
between the second factor and CS are set to 1. Also, the factor loading between the first factor and SO is set to - 
1 . 
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Error of Aproximation (RMSEA, should be smaller than .05), Comparative Fit Index (CFI) and 
Tucker Lewis Index (TLI)*^. 

The second assumption was tested by means of confirmatory factor analyses with Mplus. 
Models were the same as for assumption I (two factors with freely estimated factor loadings), but the 
correlation between the two factors was set to zero. Model fit was determined in the same manner as 
was done for assumption 1. 

The third assumption was tested by entering the uncontaminated class-level correlation 
matrices of Appendix I into RANDALL. This resulted in Correlation Indices (Cl) for each matrix, as 
well as a probability estimate. Next, the number of assumptions met and violated were reported. Also, 
the assumption was tested by formulating a non-equally spaced circumplex model (non-equal 
communalities) in CIRCUM. Model fit was assessed by looking at the Chi-squared/df ratio, RMSEA, 
CFI and TLl'*. 

The fourth assumption was tested by formulating an equally-spaced circumplex model (non- 
equal communalities) in CIRCUM. Again, model fit was assessed by looking at the model fit 
indicators. 

The fifth and last assumption*^ was tested by formulating an ideal circumplex model in Mplus. 
In this model, two factors were specified, with fixed factor loadings according to Figure 1, and the 
correlation between the two factors was set to zero. Model fit was assessed by looking at the model fit 
indicators. 

Finally, it was determined to what degree empirical locations of sectors deviated from their 
theoretical (e.g. ideal) position. Empirical factor loadings were taken from the exploratory factor 
models, used for testing assumption 1. Then, angular dislocation, relative angle distance and vector 
length of sectors were computed (Blackburn & Renwick, 1996). Finally, SPSS was used to compute 
correlations between empirical and ideal dimension scores. To achieve this, for each teacher 
dimension scores were computed with the empirical factor loadings and with the ideal factor loadings. 

6. Results 

6.1 Reliability, consistency and variance at class level 

Reliability of the sector scores at the class level was above .80 in most countries (see Table 3). 
In most countries, reliability was lowest for the student responsibility/freedom sector (SC) and strict 
sector (DO). On average, reliability was highest for Australia and Singapore. Intra-class correlations of 
sectors generally were above .20, but varied from sector to sector and from country to country: the 

RMSEA, CFI and TLI have the advantage that they are not related to sample size. RMSEA is a measure that 
takes the amount of residual variance into account, while CFI and TLI are indices that compare the fit of the 
model to no model at all (null model). CFI and TLI values should be above .95 for adequate fit. 

TLI and CFI are not provided by CIRCUM, but can be calculated by hand from available model fit indicators. 

The sixth assumption, assuming equal error variance across sectors, was not tested. 
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largest differences between classes existed in the Dutch sample (average intra-class correlation of .41), 
while the smallest differences existed for the USA, Australia and Brunei (average intra-class 
correlations of .24, .25 and .24, respectively). This meant that the QTI was better able to detect 
differences between teachers in the Netherlands and Slovakia, than in the USA, Australia and Brunei. 
Consistency across classes (Multilevel Lambda) was high (above .80) for all sectors in all countries. 



Table 3 



Cronbachs alpha ( reliability )y intra-class correlations (ICC) and Multilevel Lambda (consistency) of 
QTI sectors at the teacher/class level. 





DC 


CD 


CS 


SC 


SO 


OS 


OD 


DO 


Average 


Alpha 

Australia 


.96 


.94 


.95 


.84 


.93 


.94 


.86 


.76 


.90 


United States 


.92 


.94 


.90 


.84 


.91 


.88 


.84 


.76 


.87 


Netherlands 


.93 


.94 


.97 


.73 


.92 


.84 


.83 


.81 


.87 


Slovakia 


.90 


.89 


.77 


.85 


.86 


.86 


.92 


.86 


.86 


Singapore 


.94 


.95 


.95 


.76 


.94 


.94 


.91 


.76 


.89 


Brunei 


.83 


.89 


.75 


.80 


.79 


.93 


.87 


.82 


.84 


ICC 

Australia 


.27 


.28 


.25 


.22 


.28 


.22 


.25 


.17 


.24 


United States 


.25 


.30 


.25 


.28 


.21 


.21 


.28 


.22 


.25 


Netherlands 


.50 


.43 


.40 


.28 


.48 


.35 


.39 


.43 


.41 


Slovakia 


.49 


.30 


.19 


.27 


.33 


.04 


.44 


.44 


.31 


Singapore 


.34 


.34 


.34 


.16 


.28 


.24 


.35 


.15 


.28 


Brunei 


.14 


.24 


.18 


.18 


.18 


.32 


.38 


.29 


.24 


Lambda 

Australia 


.89 


.89 


.87 


.85 


.89 


.85 


.87 


.81 


.87 


United States 


.87 


.90 


.87 


.89 


.84 


.84 


.89 


.85 


.87 


Netherlands 


.96 


.95 


.94 


.91 


.96 


.93 


.94 


.95 


.94 


Slovakia 


.96 


.92 


.86 


.91 


.93 


.50 


.95 


.95 


.87 


Singapore 


.95 


.95 


.95 


.87 


.93 


.92 


.95 


.86 


.92 


Brunei 


.75 


.85 


.80 


.80 


.80 


.90 


.92 


.88 


.84 



6.2 Testing circumplex assumptions 

According to the first assumption, the eight scales of the questionnaire should be represented 
by two factors or dimensions. Exploratory factor analyses (maximum likelihood) on the 
uncontaminated class-level correlation matrices showed that this was the case for all six countries. 
Moreover, the two factors explained considerable amounts of variance: 78.8 percent in the Australian 
data set, 81.5 percent in the American data, 79.0 percent for the Dutch data, 79.7 percent for the 
Slovakian data, 73.8 percent for the Singapore data and 76.2 percent for the Brunei data. Also, the 
second dimension explained more than 10 percent of additional variance in all data sets. This indicated 
that student perception data of the QTI could be structured in terms of two interpersonal dimensions in 
all countries. 

Analyses testing freely estimated two-factor models that allow for correlation between the 
factors (Mplus) also indicated close fit to the data (see Table 4). CFI values were above .95 in all data 
sets, while TLI values were above .95 for the USA, Slovakia and Brunei and were close to .90 for the 
other countries. RMSEA was close to .05 in most cases, being the highest for Australia (.081) and the 
Netherlands (.067). Chi-squared values were significant in all cases, which is mainly due to the 
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relative large sample sizes for the individual level part of the data. While fit was far from perfect, it 
seemed that two dimensions structured the perceptions of interpersonal behavior as measured with the 
QTI in all countries. 

The second assumption states that the two interpersonal dimensions should be uncorrelated. 
Table 4 lists the results of confirmatory factor analyses where the correlation has been set to zero. As 
can be seen, this hardly changed CFI, TLI or RMSEA values, indicating that it is very probable that 
the two dimensions are independent. In the Australian data set, chi-squared even became smaller if the 
correlation was set to zero, indicating better model fit. In the other countries, chi-squared only 
increased by small amounts. 



Table 4 

Fit indicators of Models Testing Circumplex Assumptions 



Country 


Model 




df 


p-value 


RMSEA 


CFI 


TLI 


Australia 


Exploratory model (ass. 1) 


79.947 


14 


<.000 


.081 


.97 


.88 




Independent dimensions model (ass. 2) 


75.589 


13 


<.000 


.081 


.97 


.88 




Circular order model (ass. 3) 


31,06 


10 


<,000 


.249 


.93 


.82 




Equally spaced circumplex model (ass. 4) 


95.36 


17 


<,000 


.368 


.71 


.60 




Ideal circumplex model (ass. 5) 


194.995 


26 


<.000 


.095 


.92 


.83 


United States 


Exploratory model (ass. 1) 


41.353 


12 


<.000 


.055 


.99 


,95 




Independent dimensions model (ass. 2) 


44.663 


13 


<.000 


.055 


.99 


.95 




Circular order model (ass. 3) 


276.58 


10 


<.000 


.885 


.45 


.82 




Equally spaced circumplex model (ass, 4) 


112.09 


17 


<.000 


.406 


.81 


.71 




Ideal circumplex model (ass. 5) 


115.538 


26 


<.000 


.066 


.97 


,94 


Netherlands 


Exploratory model (ass, 1) 


61,366 


12 


<,000 


.067 


.99 


,93 




Independent dimensions model (ass. 2) 


67,220 


13 


<.000 


.067 


.98 


.93 




Circular order model (ass. 3) 


56.36 


10 


<.000 


.269 


,92 


.76 




Equally spaced circumplex model (ass, 4) 


291.92 


17 


< .000 


.503 


,50 


.17 




Ideal circumplex model (ass, 5) 


234,931 


26 


< .000 


.094 


,94 


.87 


Slovakia 


Exploratory model (ass, 1) 


26.199 


13 


.0160 


,046 


,99 


.97 




Independent dimensions model (ass. 2) 


27.776 


14 


.0152 


.045 


.99 


.97 




Circular order model (ass. 3) 


20.90 


10 


.0230 


.253 


.92 


.77 




Equally spaced circumplex model (ass. 4) 


57.36 


17 


<,000 


.374 


.70 


.50 




Ideal circumplex model (ass. 5) 


109,200 


26 


< .000 


.081 


,95 


.90 


Singapore 


Exploratory model (ass. 1) 


79,099 


13 


< .000 


.064 


,98 


,92 




Independent dimensions model (ass. 2) 


83,052 


14 


< .000 


,063 


.98 


.92 




Circular order model (ass. 3) 


43.32 


10 


< .000 


,261 


.91 


.75 




Equally spaced circumplex model (ass, 4) 


128,48 


17 


< .000 


.366 


.70 


,51 




Ideal circumplex model (ass, 5) 


182,892 


26 


<,000 


.070 


.95 


.90 


Brunei 


Exploratory model (ass, 1) 


33,515 


13 


,0014 


.050 


,99 


.95 




Independent dimensions model (ass. 2) 


37,550 


13 


< .000 


.054 


,99 


.95 




Circular order model (ass. 3) 


30.94 


10 


<,000 


.248 


,92 


.78 




Equally spaced circumplex model (ass, 4) 


87.71 


17 


< .000 


.350 


,74 


.56 




Ideal circumplex model (ass, 5) 


132.437 


26 


<,000 


.080 


,94 


,88 



Note: Since CIRCUM cannot handle multilevel data, Chi-squared, RMSEA, TLI and CFI values cannot be 
compared between models for assumptions 3 or 4 and models for the other assumptions (tested with Mplus), 



According to the third assumption, sectors should display ordering in a circular structure. 
Analyses testing this assumption on the uncontaminated class-level correlation matrices (RANDALL) 
showed relatively high correspondence indices (Cl) for all countries. In the Australian data. Cl was .84 
(p=,0008), with 264 out of 288 predictions met and 24 not met. For the USA, a Cl of .78 (p=.0004) 
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was found, with 255 out of 288 predictions met; for the Netherlands Cl was .83 (p=.0004) with 262 
predictions met; for Slovakia Cl was .65 (p=.00l2) with 238 predictions met; for Singapore Cl was .67 
(p=.0008) with 240 predictions met; and for Brunei Cl was .70 (p=.0004) with 244 predictions met. 
These outcomes indicated that the sector intercorrelations corresponded with a circular ordering best 
for Australia and the Netherlands and worst for Slovakia and Singapore. Nevertheless, Cl values were 
sufficiently high to support a circular ordering in all countries. 

A more stringent test of circular ordering of the scales is provided by confirmatory analyses 
specifying a non-equally spaced circumplex (non-equal communalities). According to Table 4, there 
was no support for a circular order model in any of the countries, since Chi-squared values were 
significant and RMSEA values were unacceptably high. One possible reason for this may be that some 
of the scales almost take similar positions on the interpersonal circle, such as CD and CS, or OS and 
OD. As a result, the sectors displayed an eliptical structure, rather than a circular structure. 

According to the fourth assumption, scales should be equally distributed over the interpersonal 
circle. Models testing the uncontaminated class-level correlation matrices for this assumption showed 
poor model fit, with significant Chi-squared values and very high RMSEA values (see Table 4). Thus, 
it seemed unlikely that sectors were equally distanced in any of the countries. 

Finally, the fifth assumption even specified exact locations of the sectors on the interpersonal 
circle. Fit indicators* of models testing this assumption within a multilevel set-up (Mplus) indicated 
that this assumption could not be supported. This was not surprising, given the outcomes with respect 
to assumptions 3 and 4. Chi-squared values were significant, RMSEA values were higher than 
acceptable, while TLI values were lower than acceptable. Nevertheless, CFl values were acceptable 
for some countries (USA, Slovakia and Singapore), and nearly acceptable in the other countries. 

In conclusion, it seemed that the QTl represented the MITB reasonably well with two 
independent dimensions structured sector scores, and could be ordered in a circular structure. 
However, scales were not equally distributed over the circle, nor did they occupy similar distances to 
the circle center, nor did they take the exact positions hypothesized by the model. 

6.3 Difference between empirical and theoretical model 

The above results raise the question to what degree the empirical model behind the data is 
different from the theoretical ideal. To investigate this, we have used the factor loadings found for 
models that tested the first assumption. These factor loadings are graphically displayed in Figures 2 to 
7, and in numerical format in Table 5. In Table 5, the absolute average difference between empirical 
and theoretical factor loadings per dimension is also given. As can be seen in Table 5, differences 
between empirical and theoretical factor loadings are largest for Slovakia for the influence dimension, 
while they are largest for Singapore for the proximity dimension. They are smallest for Singapore for 
the influence dimension and smallest for the Netherlands for the proximity dimension. 
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Figure 2 

Australia - factor loadings 



Figure 3 

USA - factor loadings 



Q 
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Figure 4 

Netherlands - factor loadings. 



Figure 5 

Slovakia - factor loadings 
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Figure 6 

Singapore - factor loadings 



Figure 7 

Brunei - factor loadings 



Figures 2 to 7 confirmed that in most countries an elliptic, rather than circular, structure could 
be found, and that some of the scales hardly occupied different positions in the interpersonal plane. 
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Table 5 



Empirical factor loadings in six countries. 



Scale 


Australia 
DS CO 


United States 
DS CO 


Netherlands 
DS CO 


Slovakia 

DS 


CO 


Singapore 

DS CO I 


DC 


.72 


.60 


.73 


.71 


.92 


.47 


.76 


.54 


.82 


.44 


CD 


.07 


1.42 


.30 


1.29 


.37 


.90 


.17 


.93 


.34 


1.24 


CS 


.18 


1.14 


.15 


1.02 


.24 


.92 


-.05 


.97 


.20 


1.24 


SC 


-.70 


.70 


-.77 


.49 


-.28 


.39 


-.06 


.89 


-.94 


1.37 


SO 


-1.00 


-.08 


-.85 


-.35 


-.92 


-.14 


-.57 


-.70 


-1.15 


.06 


OS 


-.12 


-1.13 


-.22 


-.89 


-.07 


-.81 


.30 


-.78 


-.19 


-1.20 


OD 


.20 


-1.39 


.00 


-.92 


.13 


-.80 


.36 


-.83 


.25 


-1.74 


DO 


.62 


-.74 


.46 


-.57 


.38 


-.54 


.76 


-.51 


.68 


-1.34 


Diff 


.26 


.33 


.25 


.15 


.30 


.09 


.35 


.18 


.19 


.52 



Note: Diff = absolute average difference between theoretical and empirical factor loadings for the dimension. 



The next step consisted of establishing the exact difference between empirical and theoretical 
positions of the sectors. Therefore, angular dislocation was computed (see footnote #8) in terms of 
degrees (Table 6) and in terms of relative angle distance (Table 7). 



Table 6 



Angular dislocation in degrees. 





DC 


CD 


CS 


SC 


SO 


OS 


OD 


DO 


Australia 


-17.31 


-19.68 


-31.47 


-19.41 


27.07 


-16.44 


14.31 


27.54 


United 

States 


-21.71 


-9.41 


-30.87 


9.97 


0.12 


-8.62 


22.44 


28.6 


Netherlands 


-4.56 


-0.15 


-37.12 


31.82 


31.15 


-17.56 


13.27 


32.37 


Slovakia 


-23.42 


-15.95 


-24.99 


64.68 


-19.90 


-45.27 


17.56 


28.66 


Singapore 


5.72 


7.17 


31.66 


-33.04 


-25.49 


13.50 


-14.32 


-40.59 


Brunei 


10.59 


-2.80 


-20.75 


-30.63 


-33.12 


20.26 


-7.62 


-33.07 



All sectors shifted from their ‘theoretical’ position to some degree. This was particularly true 
for the understanding (CS), student responsibility/freedom (SC), and strict (DO) sectors in most 
countries. The understanding (CS) sector had moved more than 30 degrees clockwise in the Dutch, 
Australian and American samples, but in a counter-clockwise direction in the Singapore sample. The 
student responsibility/freedom (SC) sector had moved more than 30 degrees counter-clockwise in the 
Dutch and Slovak samples and more than 30 degrees in clockwise direction in the Singapore and 
Brunei samples. Thus, these two sectors were relatively closely positioned in the Dutch sample, but 
relatively distant in the Singapore sample. The strict (DO) sector had moved considerably counter- 
clockwise in the Dutch sample, but clockwise in the Singapore and Brunei samples. Thus, in the Dutch 
sample strictness incorporated less influence than theoretically hypothesized and more proximity^ 
while the opposite was true for Brunei and Singapore. 
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Table 7 



Relative angular dislocation. 





DC 


CD 


CS 


SC 


SO 


OS 


OD 


DO 


Average 

dislocation 


Australia 


.90 


.89 


.83 


.89 


.85 


.91 


.92 


.85 


.88 


United States 


.88 


.95 


.83 


.95 


.99 


.95 


.88 


.84 


.91 


Netherlands 


.98 


1.00 


.79 


.82 


.83 


.90 


.93 


.82 


.88 


Slovakia 


.87 


.91 


.86 


.64 


.89 


.75 


.90 


.84 


.83 


Singapore 


.97 


.96 


.82 


.82 


.86 


.93 


.92 


.77 


.88 


Brunei 


.94 


.98 


.88 


.83 


.82 


.89 


.96 


.82 


.89 


Overall 


.92 


.95 


.84 


.83 


.87 


.89 


.92 


.82 





Table 7 is another means of displaying the “shifts” in position of sectors. Therefore, the 
outcomes are similar to those of Table 6. As can be seen, the strict sector (DO) was most dislocated 
(.82), followed by the student responsibility/freedom (SC) sector (.83) and understanding (CS) sector 
(.84). The sectors least dislocated were the helpful/friendly (CD) sector (.95), leadership (DC) sector d 

(.92) and admonishing (OD) sector (.92). When comparing the mean or average dislocation per 
country, it could be seen that for the American data set sectors were least dislocated (.91), while they 
were most dislocated for the Slovakian data set (.83). However, differences in average dislocation 
between countries were small. 

Apart from angular dislocation, dislocation was also determined with respect to distance from 
the circle center (see footnote #9 for computation). Ideally, all sectors should have equal distance to 
the circle center. Vector length is presented in Table 8. 



Table 8 

Vector length (distance to circle centre) of sectors ofQTI. 





DC 


CD 


CS 


SC 


SO OS 




OD 


DO 


Mean 


Variance 


Australia 


.94 


1.42 


1.15 


1.05 


1.00 


1.14 


1.40 


.97 


1.14 


.167 


United 

States 


1.02 


1.32 


1.03 


.91 


.92 


.92 


.92 


.73 


.97 


.157 


Netherlands 


1.03 


.97 


.95 


.48 


.93 


.81 


.81 


.66 


.83 


.171 


Slovakia 


1.32 


1.23 


.92 


1.22 


1.25 


1.22 


1.63 


1.52 


1.29 


.200 


Singapore 


.93 


1.29 


1.26 


1.66 


1.15 


1.21 


1.76 


1.50 


1.35 


.258 


Brunei 


1.06 


1.22 


.98 


.65 


1.14 


1.28 


1.48 


1.08 


1. 11 


.227 



Table 8 displays a number of interesting patterns. First, vector length varied per sector from 
country to country, with the largest differences found for the student responsibility/freedom (SC) 
sector (ranging between .48 for the Netherlands and 1 .66 for Singapore), the admonishing (OD) sector 
(ranging between .81 for the Netherlands and 1.76 for Singapore) and strict (DO) sector (ranging 
between .66 for the Netherlands and 1.52 for Slovakia). Second, according to the ideal (theoretical) 
circumplex model, variance in vector length should be minimal (zero). As can be seen, considerable 
variance in vector length was found within each country, with the USA displaying the least variance 
and Singapore the most variance. Also, it can be seen that the average distance of scales to the circle 
center was smallest for the Netherlands (.83) and largest for Singapore (1.35). 
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Finally, we computed correlations between empirical and theoretical dimension scores 
(see footnote #10). These correlations are given in Table 9 and can be seen in all countries a close 
resemblance was found between empirical and theoretical dimension scores. For proximity 
correlations ranged between .956 for the Brunei sample to .998 for the American, Dutch and Slovak 
samples. For influence correlations ranged between .899 for the Dutch sample to .993 for the 
Australian sample. These high correlations show that, despite irregularities found with the MITB in 
each of the countries (as described in testing assumptions 1 to 5), the QTI was capable of closely 
reproducing theoretically expected dimension scores in all countries. 



Table 9 

Correlations between ideal and empirical dimension scores of Q TI, 



Influence 


Proximity 


Australia 


.993 


.993 


United States 


.922 


.997 


Netherlands 


.899 


.998 


Slovakia 


.973 


.998 


Singapore 


.972 


.977 


Brunei 


.967 


.956 



Summarizing the above findings, it can be concluded that irregularities from the ideal 
circumplex mainly related to dislocation in the understanding, student responsibility/freedom and strict 
sectors. While country-to-country differences were found, the QTI was capable of closely reproducing 
ideal dimension scores in all countries. 



7. Discussion 

In this article, the validity of the QTI was compared for samples of secondary science teachers 
in six countries. The QTI is supposed to be an adequate operationalization of the MITB which can be 
regarded as a manifestation of a circumplex model. Validity was investigated in terms of testing 5 
assumptions that lay behind circumplex models. 

In general, the outcomes of the study support the existence of two dimensions (assumption 1), 
which are uncorrelated (assumption 2). Some support was found for a circular ordering of the sectors 
(assumption 3), but no support was found for equal distancing of sectors across the circle (assumption 
4) and specific hypothesized locations of the sectors on the circle (assumption 5). Thus, a two- 
dimensional circular structure may lay behind the QTI, but empirical positions of the sectors on this 
circle deviate from their theoretically ideal positions. These deviations particularly occurred for the 
strict (DO), understanding (CS) and student responsibility/freedom (SC) sectors. 

Some of the outcomes of the study are particularly interesting. First, country-to-country 
differences were found in the positions of the sectors on the interpersonal circle. For example, student 
responsibility/freedom and understanding occupied nearly similar positions on the circle in the 
Netherlands, while a large distance between them was found in Singapore. The understanding sector 
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contained less influence than expected in Australia, the Netherlands and the USA while it included 
more influence than was expected in Singapore; the student responsibility/freedom sector had more 
influence and proximity than expected in the Netherlands and Slovakia, but less influence and 
proximity than expected in Singapore and Brunei. Also, strictness seemed to incorporate less influence 
in the Netherlands than hypothesized, while it incorporated more influence than hypothesized in 
Singapore and Brunei. These between-country differences might reflect different meanings or 
connotations attached to the sectors of interest. However, they might also reflect differences in focus: 
in some countries strictness or student responsibility may be observed from different cues than in other 
countries, or cues may be valued differently in terms of importance, even if (some) equivalence in 
meaning exists. Finally, the differences might be caused by the QTI development process (which was 
different from country to country), sampling procedure or characteristics (which also differed from 
country to country). Further in-depth research on between-country differences in sector positions is 
needed. This research should include interviews with students and/or teachers regarding the meanings 
attached to certain concepts, particularly strictness, understanding and student responsibility. 
Interesting questions would be whether Dutch students regard understanding and student responsibility 
as similar concepts and why Singaporean students regard them as (completely) different. Also, it 
would be interesting to see if students from different countries use different cues to form their 
perceptions (observe different elements). For example, is strictness inferred from the same verbal and 
non-verbal behaviors by Dutch and Singaporean or Brunei students? Also, it could be investigated 
how important these differences really are. In this study, for example, it was found that while sectors 
occupied different positions than hypothesized and between-country differences were found in this 
respect, correlations between empirical and theoretical dimension scores were extremely high in all 
countries and hardly varied between countries. Also, earlier research has shown that the interpersonal 
dimensions and sectors display similar relationships with student outcomes in different countries, both 
in terms of directions and magnitude (e.g. Brekelmans, Wubbels, & den Brok, 2002; Fisher, et al., 
1996). 

From a research point of view, the fact that some sectors display huge amounts of overlap may 
lead to the conclusion that some sectors can be omitted from the model. Another conclusion may be 
that different items should be added to the list that may alter the position of the sector(s) on the circle. 
Despite such possible alterations, the instrument in its current form may still be interesting and 
necessary from a feedback or training point of view. 

Another interesting conclusion that may be drawn from the outcomes of this study is the fact 
that short versions of the QTI from the USA and Netherlands are almost equally valid as their longer 
equivalents. This may lead to considerably shortening of the instruments for use in future research or 
professional development activities. 

The methods used in this study may be of particular importance for the field of learning 
environments research, and science learning environments in particular. First of all, it has been 

22 




23 



concluded that research in this domain hardly uses multilevel analyses techniques, while the data in 
most cases are hierarchical in nature (e.g. Fraser, 1998). Therefore, using techniques that adjust for 
multi-stage sampling (non-random data) may lead to more precise estimates of effects and 
correlational structures of interest. Sometimes, using such techniques can also show interesting 
interactions between variables, and may alter previous findings completely (e.g. den Brok, Levy, 
Wubbels & Brekelmans, in press). Research on validity of learning environment instruments may 
particularly benefit from the use of Structural Equation Modelling (SEM) techniques, such as used in 
this study. Structural Equation Modelling allows researchers to test specific hypotheses or assumptions 
behind their data, provides indices for the fit of a model to a particular data set and provides 
researchers with the means to take measurement error into account. One could, for example, apply 
these techniques to test whether the learning environment dimensions of Moos (1979) structure scale 
scores of instruments such as the Science Laboratory Environment Inventory (SLEI), Classroom 
Environment Scale (CES) or What Is Happening In this Class (WIHIC). Such investigations would 
lead to increased insight into the structuring of student perceptions and the internal validity of such 
instruments. 

While the study compensated for some earlier limitations, the study itself was also subject to 
limitations. First, sampling procedures and characteristics varied between countries. This may have led 
to between-country differences in validity. Also, sample sizes varied from country to country and were 
quite small for some countries. Research has shown that in order to achieve stable results, more than 
50 teachers/classes are needed (Hox & Maas, 2001). Also, while a selection of items was used for 
three countries to enhance comparison, and these selections still appeared to display reasonable 
validity, it may well be that the 'complete’ instruments would show different results, and even better 
validity. Thus, this study to some degree ignored differences in the development between countries. 
Then, validity testing concentrated on the class-level, while the individual level of the data was not 
subjected to validity testing. Earlier research showed that different structures may apply to these 
levels. Testing structures at these levels between countries would also have been interesting. Finally, 
while the structure behind perceptions was tested, it was not tested whether differences in 
interpersonal behavior occurred between science teachers of different countries, either in terms of 
sectors or dimensions. The results of this study show that such cross national comparisons may only 
be valid in terms of dimension scores, but not in terms of sector scores. Also, other aspects of validity, 
such as predictive validity, were not compared between countries. Neither did the study investigate 
whether differences in interpersonal styles existed between countries. Researchers may use the 
outcomes of the present study to make such comparisons in future research with some reassurance that 
the QTl displays similar structures in different countries. 

Finally, the QTl appears to be cross-country comparable in terms of validity. This opens up 
opportunities to start international studies on science teaching practice, joint research or professional 
development projects between countries, or using the instruments in multicultural and international 
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schools settings. It may help researchers, supervisors, school management and teachers to develop a 
common language to discuss their practice and include teacher behavior in large-scale science 
education research efforts such as TIMSS. 
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Appendix 1 

Uncontaminated class-level correlation matrices for the 6 countries. 



Australia 





DC 


CD 


CS 


SC 


SO 


OS 


OD 


DO 


DC 


1.00 
















CD 


.80 


1.00 














CS 


.83 


.93 


1.00 












SC 


-.30 


.16 


-.03 


1.00 










SO 


-.93 


-.59 


-.69 


.61 


1.00 








OS 


-.75 


-.90 


-.87 


.03 


.64 


1.00 






OD 


-.47 


-.73 


-.80 


-.11 


.34 


.88 


1.00 




DO 


.18 


-.23 


-.08 


-.87 


-.49 


.16 


.36 


1.00 



United States 
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SC 
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DC 
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CD 


.79 


1.00 














CS 


.69 


.94 


1.00 












SC 


.07 


.53 


.61 
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SO 


-.77 


-.52 


-.39 


.40 


1.00 
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-.57 


-.87 


-.89 


-.58 


.27 


1.00 






OD 


-.48 


-.79 


-.84 


-.52 


.27 


.91 


1.00 




DO 


.02 


-.52 


-.60 


-.81 


-.22 


.65 


.67 


1.00 


Netherlands 
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.01 
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.58 
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Brunei 
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OS 


OD 


DO 


DC 


1.00 
















CD 


.71 


1.00 














CS 


.53 


.85 


1.00 












SC 


.18 


.71 


.73 


1.00 










SO 


-.45 


.04 


-.31 


.63 


1.00 








OS 


-.54 


-.74 


-.95 


-.62 


-.28 


1.00 






OD 


-.33 


-.65 


-.94 


-.69 


-.40 


.94 


1.00 




DO 


-.16 


-.71 


-.83 


-.83 


-.61 


.75 


.82 


1.00 
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